Phase 1c: Bias-corrected local-linear CI (CCT 2014)#340
Conversation
Port nprobust::lprobust's single-eval-point path (lprobust.R:177-246) as
the foundation for paper Equation 8:
- diff_diff/_nprobust_port.py: add `lprobust()` + `LprobustResult`. Uses
the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q`
to produce classical and bias-corrected point estimates along with naive
and robust (CCT 2014) standard errors in a single pass.
- diff_diff/local_linear.py: add `bias_corrected_local_linear()` +
`BiasCorrectedFit`. Public wrapper returns the mu-scale CI
`[tau.bc +/- z_{1-alpha/2} * se.rb]`. Auto-bandwidth path delegates to
`mse_optimal_bandwidth` and honors nprobust's rho=1 default (b = h).
Also extracts shared `_validate_had_inputs` helper from Phase 1b.
- benchmarks/R/generate_nprobust_lprobust_golden.R + golden JSON: 5 DGPs
(Uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's
z = qnorm(1-alpha/2) exported so Python skips ppf and matches bit-wise
on CI arithmetic.
- Tests: TestLprobustSingleEval (8 port-level) + test_bias_corrected_lprobust
(29 wrapper-level). Tiered tolerances per plan: 1e-12 on tau/se,
1e-13 on CI bounds; clustered DGP 4 hits bit-parity (1e-14).
Known deviations from nprobust (in REGISTRY.md): hc2/hc3 + cluster
raises (nprobust silently accepts); clustered DGP 4 uses manual h=b=0.3
to sidestep an nprobust-internal singleton-cluster bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker — one P0 inference bug and one P1 methodology/parameter-propagation bug remain in the new Phase 1c path. Executive Summary
Methodology
Code Quality No additional findings beyond the inference bug above. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
P0: bias_corrected_local_linear now routes the CI through `safe_inference()` so degenerate cases with `se_robust <= 0` or non-finite `se_robust` (e.g., exact-fit / constant-y) return `(NaN, NaN)` rather than a misleading zero-width or infinite CI. Matches the repo-wide inference contract (CLAUDE.md Key Design Pattern #6). P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went through `mse_optimal_bandwidth` which hard-codes unclustered / nn / nnmatch=3, silently mismatching the downstream `lprobust` fit's reported estimator. Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto, nnmatch=5+auto), all asserting the selected bandwidth changes when the corresponding parameter changes (catches silent fallback). Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3 branch (numpy issue #21432 pattern), newly reachable via the wired-through vce='hc1' auto-bandwidth path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
CI review follow-up: the floating-dtype-only missing-ID guard in `bias_corrected_local_linear` and `lprobust` let object-dtype arrays with `None` / object `np.nan` sentinels bypass validation. The downstream `lprobust_vce` cluster loop would then group on `np.unique`, treating the sentinel as a real cluster and silently misstating clustered SE. Extract the dtype-agnostic `_cluster_has_missing` helper already used inside `lpbwselect_mse_dpi` and apply it at all three entry points: the selector, the port-level `lprobust`, and the public `bias_corrected_local_linear` wrapper. Regression tests: object-dtype cluster arrays with None and with np.nan sentinels raise a targeted ValueError at both the wrapper (test_bias_corrected_lprobust) and the port (test_nprobust_port) entry points. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — this re-review resolves the prior P1 on object-dtype missing cluster IDs, keeps the earlier CI/inference fix in place, and I did not find any new unmitigated P0/P1 issues in the changed files. Review was static only because this sandbox does not have Executive Summary
Methodology Affected methods: Calonico-Cattaneo-Titiunik (2014) robust bias correction via
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
Docstring note from AI review. The wrapper's docstring still said auto mode chooses both h and b via `mse_optimal_bandwidth`, but the implementation now calls `_nprobust_port.lpbwselect_mse_dpi` directly (to forward cluster / vce / nnmatch) and then sets `b = h` per nprobust's rho=1 default. Update the docstrings to reflect that. Behavior is unchanged; this is a comment-only clarification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — this re-review resolves the prior P1 surfaces I checked, and I did not find any unmitigated P0/P1 issues in the changed files. The only remaining item is a P3 documentation/test-contract mismatch. This was a static review only because the sandbox does not have Executive Summary
Methodology
I did not find any unmitigated P0/P1 methodology defects in the changed estimator path. The prior cluster-missing, NaN-safe CI, and auto-forwarding findings appear addressed in diff_diff/_nprobust_port.py:L87-L111, diff_diff/_nprobust_port.py:L1138-L1154, diff_diff/local_linear.py:L1063-L1083, diff_diff/local_linear.py:L1189-L1203, diff_diff/local_linear.py:L1103-L1121, tests/test_bias_corrected_lprobust.py:L313-L342, tests/test_bias_corrected_lprobust.py:L430-L518, and tests/test_nprobust_port.py:L591-L620. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Review note: I could not execute the new tests in this sandbox because |
…tion) P3 doc-accuracy note from AI review: the registry, R generator, and local_linear.py comments said the Python side "consumes R's z directly" but the wrapper actually computes its own z via scipy.stats.norm.ppf inside safe_inference(). The golden JSON's z field is an audit/reference export so a reviewer can confirm R's qnorm and Python's scipy.stats.norm.ppf agree to machine precision. The parity harness compares Python-computed CI bounds to R-computed CI bounds (not R's z fed into Python's arithmetic). Behavior is unchanged; only doc phrasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed files. The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper as recorded in Executive Summary
Review note: this was a static review. I could not execute the new parity tests in this sandbox because the available Python environment is missing Methodology
No unmitigated P0/P1 methodology defects found in the changed estimator path. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
P3 follow-up from AI review. Three small inconsistencies to resolve: 1. `bias_corrected_local_linear` docstring still described tau_cl/se_cl as bit-parity and said Python consumes R's z directly. The actual contract is atol=1e-12 on all four scalars (DGP 1-3) and the wrapper computes its own z via scipy.stats.norm.ppf; R's qnorm is stored in the JSON for audit only. Docstring updated to match. 2. Committed golden JSON metadata still had the old "consume R's critical value directly" string because the generator was edited without regenerating. Regenerated so JSON metadata matches the corrected audit-export wording in the R script. 3. Parity tests for DGP 4 and DGP 5 did not assert CI bounds. Added ci_low / ci_high assertions at the same tolerance as the corresponding se_rb assertion (bit-parity for DGP 4, 1e-12 for DGP 5), so the audit surface matches what the registry states. Behavior unchanged; tests strengthened and docs aligned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt
Security Documentation/Tests
Path to Approval
|
Address CI review P1 on HC2/HC3 leverage. nprobust's lprobust reuses the p-fit hat-matrix leverage for the q-fit residual path (lprobust.R:229-241). The Python port matches R exactly, but that R behavior deserves a derivation and a dedicated golden anchor before the public wrapper can advertise CCT-2014 conformance on hc-mode paths. The Phase 1c plan always said hc0-3 were "exposed but not golden-tested"; this commit restricts the public surface to what actually has a parity anchor. Changes: - `bias_corrected_local_linear`: `vce != "nn"` now raises `NotImplementedError` with a pointer to the port-level `diff_diff._nprobust_port.lprobust` for callers who still need the broader surface. Docstring updated. - Tests: swap positive-path hc-mode tests for negative-path raises; drop `test_auto_vce_hc1_returns_finite` (now superseded by `test_auto_vce_hc1_rejected_in_phase_1c`); drop the cluster+hc2/hc3 tests (now covered by the plain hc2/hc3 rejection tests). - REGISTRY.md: swap "Deviation from R" wording for a clearer "public-API surface restriction" note explaining the R-side hii reuse and the deferred derivation. - TODO.md: upgrade the hc-mode expansion TODO from Low to Medium and document the q-fit leverage derivation as the gating work. Also fix the P3 tolerance-doc inconsistency: registry/docstring now say atol=1e-12 on all six scalars across unclustered DGPs (tests actually assert 1e-12, with DGP 1 / DGP 3 landing closer to 1e-13 in practice). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P3 follow-up from AI review. The class-level scope note still claimed HC0-3 were "exposed here", and the Raises section listed the old `ValueError` for `vce='hc2'/'hc3'` with cluster. Both are stale: the wrapper now raises `NotImplementedError` for any `vce != "nn"` regardless of cluster. Updated the scope note and Raises section to reflect the current behavior. Behavior unchanged; docstring-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. This re-review’s prior documentation mismatch is resolved, and the remaining Phase 1c limitations are either documented in the Methodology Registry or explicitly tracked in Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
lprobust()single-eval-point path (lprobust.R:177-246) intodiff_diff/_nprobust_port.pyaslprobust()+LprobustResult. Uses the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrixQ.qto compute classical and bias-corrected point estimates along with naive and robust standard errors in a single pass.bias_corrected_local_linear()+BiasCorrectedFitindiff_diff/local_linear.py. Returns the μ-scale CI[tau.bc ± z_{1-α/2} * se.rb]for paper Equation 8. Auto-bandwidth path delegates tomse_optimal_bandwidthand matches nprobust'srho=1default (b = h)._validate_had_inputshelper so Phase 1b and Phase 1c enforce identical HAD-scope input rules (empty/non-finite/negative-dose/off-support-boundary/mass-point/Design 1' plausibility).benchmarks/R/generate_nprobust_lprobust_golden.R+ JSON on 5 DGPs (uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R'sz = qnorm(1-α/2)exported so Python consumes the exact critical value, eliminatingppf/qnormULP drift on CI bounds.Methodology references (required if estimator / math changes)
docs/methodology/papers/dechaisemartin-2026-review.md; nprobust 0.5.0 (SHA36e4e53),R/nprobust/R/lprobust.R:177-246.vce in {"hc2","hc2_bm"}combined withcluster=raisesValueError; nprobust silently accepts the combination but it is not a well-defined estimator (hc2/hc3 assume independent observations). (b) Auto mode usesb = h(matching nprobust'srho=1default) rather than Phase 1b's distinctb_mse; the paper itself uses a singleh*_Gthroughout Equation 8, andb_mseis still surfaced viabandwidth_diagnosticsfor inspection. (c) Clustered golden DGP 4 uses manualh=b=0.3to sidestep an nprobust-internal singleton-cluster shape bug inlpbwselect.mse.dpi's pilot fits; the Python port has no equivalent bug. All three deviations are documented indocs/methodology/REGISTRY.mdunder the Phase 1c entry.Validation
tests/test_nprobust_port.py::TestLprobustSingleEval(8 port-level tests exercising the CCT-2014 math with R-supplied bandwidths);tests/test_bias_corrected_lprobust.py(29 wrapper-level tests: parity × 5 DGPs, CI behavior, full input contract, parameter interactions, end-to-end with Phase 1b selector, validator idempotence). Parity at tiered tolerances perdocs/methodology/REGISTRY.mdPhase 1c entry:atol=1e-12on tau/se outputs,atol=1e-13on CI bounds (R'szexported in golden JSON); clustered DGP 4 hits bit-parity atatol=1e-14.Security / privacy
🤖 Generated with Claude Code